Search CORE

3 research outputs found

Extracting common sense knowledge from Wikipedia

Author: Halpin Harry
Klein Ewan
Suh Sangweon
Publication venue
Publication date: 01/01/2006
Field of study

Artificial Intelligence

Author: Sangweon Suh
Publication venue
Publication date
Field of study

Much of the natural language text found on the web contains various kinds of common sense knowledge and such information is potentially an important supplement to more formal approaches to building knowledge bases for Semantic Web applications. The common sense knowledge is often expressed in the form of generic statements such as “Elephants are mammals. ” The generic statement refers to a sentence that talks about kinds rather than individuals. In this thesis, we develop methods of automatically extracting generic statements from unrestricted natural language text and mapping them into an appropriate logi-cal formalism for Semantic Web. The extraction process utilises cascaded transduc-tion rules for identifying generic sentences and extracting relations from them. NLP pipeline and a suite of XML tools designed for generic manipulation of XML were used to carry out the series of tasks. The Wikipedia XML corpus was adopted for develop-ment, as a rich source of generic statements, and we used existing annotations of the ACE 2005 corpus for testing the identification of generic terms. For identifying generic terms, we use a set of morpho-syntactic features coded into definite transduction rules and apply the rules to the noun groups resulting from chunking. Next, relations are extracted with those identified terms as arguments. The semantic interpretation for the relation extraction was based on what can be called semantic chunking. Finally, we show how these extracted relations can be converted to RDF(S) statements as a knowledge representation for Semantic Web.

CiteSeerX